model data
RynnVLA-002: A Unified Vision-Language-Action and World Model
Cen, Jun, Huang, Siteng, Yuan, Yuqian, Li, Kehan, Yuan, Hangjie, Yu, Chaohui, Jiang, Yuming, Guo, Jiayan, Li, Xin, Luo, Hao, Wang, Fan, Zhao, Deli, Chen, Hao
We introduce RynnVLA-002, a unified Vision-Language-Action (VLA) and world model. The world model leverages action and visual inputs to predict future image states, learning the underlying physics of the environment to refine action generation. Conversely, the VLA model produces subsequent actions from image observations, enhancing visual understanding and supporting the world model's image generation. The unified framework of RynnVLA-002 enables joint learning of environmental dynamics and action planning. Our experiments show that RynnVLA-002 surpasses individual VLA and world models, demonstrating their mutual enhancement. We evaluate RynnVLA-002 in both simulation and real-world robot tasks. RynnVLA-002 achieves 97.4% success rate on the LIBERO simulation benchmark without pretraining, while in real-world LeRobot experiments, its integrated world model boosts the overall success rate by 50%.
WorldVLA: Towards Autoregressive Action World Model
Cen, Jun, Yu, Chaohui, Yuan, Hangjie, Jiang, Yuming, Huang, Siteng, Guo, Jiayan, Li, Xin, Song, Yibing, Luo, Hao, Wang, Fan, Zhao, Deli, Chen, Hao
We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action generation. Meanwhile, the action model generates the subsequent actions based on image observations, aiding in visual understanding and in turn helps visual generation of the world model. We demonstrate that WorldVLA outperforms standalone action and world models, highlighting the mutual enhancement between the world model and the action model. In addition, we find that the performance of the action model deteriorates when generating sequences of actions in an autoregressive manner. This phenomenon can be attributed to the model's limited generalization capability for action prediction, leading to the propagation of errors from earlier actions to subsequent ones. To address this issue, we propose an attention mask strategy that selectively masks prior actions during the generation of the current action, which shows significant performance improvement in the action chunk generation task.
DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning
Liu, Xiao-Yin, Zhou, Xiao-Hu, Xie, Xiao-Liang, Liu, Shi-Qi, Feng, Zhen-Qiu, Li, Hao, Gui, Mei-Jiang, Xiang, Tian-Yu, Huang, De-Xing, Hou, Zeng-Guang
Model-based reinforcement learning (RL), which learns environment model from offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. Therefore, this paper proposes a milDly cOnservative Model-bAsed offlINe RL algorithm (DOMAIN) without estimating model uncertainty to address the above issues. DOMAIN introduces adaptive sampling distribution of model samples, which can adaptively adjust the model data penalty. In this paper, we theoretically demonstrate that the Q value learned by the DOMAIN outside the region is a lower bound of the true Q value, the DOMAIN is less conservative than previous model-based offline RL algorithms and has the guarantee of security policy improvement. The results of extensive experiments show that DOMAIN outperforms prior RL algorithms on the D4RL dataset benchmark, and achieves better performance than other RL algorithms on tasks that require generalization.
- Asia > Macao (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
High-precision interpolation of stellar atmospheres with a deep neural network using a 1D convolutional auto encoder for feature extraction
Plaza, C. Westendorp, Ramos, A. Asensio, Prieto, C. Allende
Given the widespread availability of grids of models for stellar atmospheres, it is necessary to recover intermediate atmospheric models by means of accurate techniques that go beyond simple linear interpolation and capture the intricacies of the data. Our goal is to establish a reliable, precise, lightweight, and fast method for recovering stellar model atmospheres, that is to say the stratification of mass column, temperature, gas pressure, and electronic density with optical depth given any combination of the defining atmospheric specific parameters: metallicity, effective temperature, and surface gravity, as well as the abundances of other key chemical elements. We employed a fully connected deep neural network which in turn uses a 1D convolutional auto-encoder to extract the nonlinearities of a grid using the ATLAS9 and MARCS model atmospheres. This new method we call iNNterpol effectively takes into account the nonlinearities in the relationships of the data as opposed to traditional machine-learning methods, such as the light gradient boosting method (LightGBM), that are repeatedly used for their speed in well-known competitions with reduced datasets. We show a higher precision with a convolutional auto-encoder than using principal component analysis as a feature extractor.We believe it constitutes a useful tool for generating fast and precise stellar model atmospheres, mitigating convergence issues, as well as a framework for future developments. The code and data for both training and direct interpolation are available online at https://github.com/cwestend/iNNterpol for full reproducibility and to serve as a practical starting point for other continuous 1D data in the field and elsewhere.
- Europe > Spain > Canary Islands > Tenerife (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Fang, Jiarui, Zhu, Zilin, Li, Shenggui, Su, Hui, Yu, Yang, Zhou, Jie, You, Yang
The pre-trained model (PTM) is revolutionizing Artificial Intelligence (AI) technology. However, the hardware requirement of PTM training is prohibitively high, making it a game for a small proportion of people. Therefore, we proposed PatrickStar system to lower the hardware requirements of PTMs and make them accessible to everyone. PatrickStar uses the CPU-GPU heterogeneous memory space to store the model data. Different from existing works, we organize the model data in memory chunks and dynamically distribute them in the heterogeneous memory. Guided by the runtime memory statistics collected in a warm-up iteration, chunks are orchestrated efficiently in heterogeneous memory and generate lower CPU-GPU data transmission volume and higher bandwidth utilization. Symbiosis with the Zero Redundancy Optimizer, PatrickStar scales to multiple GPUs on multiple nodes. % using data parallelism. The system can train tasks on bigger models and larger batch sizes, which cannot be accomplished by existing works. Experimental results show that PatrickStar extends model scales 2.27 and 2.5 times of DeepSpeed, and consistently exhibits significantly higher execution speed. PatricStar also successfully runs the 175B GPT3 training task on a 32 GPU cluster. Our code is publicly available at https://github.com/Tencent/PatrickStar.
Airborne LiDAR-assisted deep learning methodology for riparian land cover classification using aerial photographs and its application for flood modelling
In response to challenges in land cover classification (LCC), many researchers have experimented recently with classification methods based on artificial intelligence techniques. For LCC mapping of the vegetated Asahi River in Japan, the current study uses deep learning (DL)-based DeepLabV3 module for image segmentation of aerial photographs. We modified the existing model by concatenating data on its resultant output port to access the airborne laser bathymetry (ALB) dataset, including voxel-based laser points and vegetation height (i.e. Findings revealed that the modified approach improved the accuracy of LCC greatly compared to our earlier unsupervised ALB-based method, with 25 and 35% improvement, respectively, in overall accuracy and the macro F1-score for November 2017 dataset (no–leaf condition). Finally, by estimating flow-resistance parameters in flood modelling using LCC mapping-derived data, we conclude that the upgraded DL methodology produces better fit between numerically analyzed and observed peak water levels.
Machine learning: Apache Flink ML 2.0 opens for Python - Market Research Telecast
The team behind Apache Flink has released Apache Flink ML in version 2.0. This is an accompanying library for machine learning purposes for the framework for processing data streams. Apache Flink ML provides both APIs and infrastructure to build stream-batch unified ML algorithms. These should be easy to use and offer almost real-time latency. The current release should make a significant contribution to expanding Apache Flink to new use cases from the machine learning area, in particular real-time ML scenarios.
Global Big Data Conference
Aquarium, a startup from two former Cruise employees, wants to help companies refine their machine learning model data more easily and move the models into production faster. Today the company announced a $2.6 million seed led by Sequoia with participation from Y Combinator and a bunch of angel investors including Cruise co-founders Kyle Vogt and Dan Kan. When the two co-founders CEO Peter Gao and head of engineering Quinn Johnson, were at Cruise they learned that finding areas of weakness in the model data was often the problem that prevented it from getting into production. Aquarium aims to solve this issue. "Aquarium is a machine learning data management system that helps people improve model performance by improving the data that it's trained on, which is usually the most important part of making the model work in production," Gao told me.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.40)
How Do You Test AI Systems?
Everyone who has ever worked on an application development project knows that you don't just simply put code and content out in production, to your customers, employees, or stakeholders without first testing it to make sure it's not broken or dead on delivery. Quality Assurance (QA) is such a core part of any technology or business delivery that it's one of the essential components of any development methodology. And the best way to do all this is in an agile fashion, in small, iterative chunks so you make sure to respond to the continuously evolving and changing needs of the customer. Surely AI projects are no different. There are iterative design, development, testing, and delivery phases, as we've discussed in our previous content on AI methodologies.
When Is it Okay to Use Data for AI?
Developing AI requires a lot of data and, in many cases, this data comes from third parties. But organizations willing to share data for computational uses have not had easy-to-use licenses for distributing data. Many common licenses, such as the Creative Commons licenses, were developed without consideration for how data could be used for machine learning. The absence of model data sharing agreements has put off many data owners who would otherwise be eager to share their data, thus hindering AI development. To address this problem, Microsoft has published three model data use agreements designed to address if and how data could be used for AI development.